The Architecture of SciDB
نویسندگان
چکیده
SciDB is an open-source analytical database oriented toward the data management needs of scientists. As such it mixes statistical and linear algebra operations with data management ones, using a natural nested multi-dimensional array data model. We have been working on the code for two years, most recently with the help of venture capital backing. Release 11.06 (June 2011) is downloadable from our website (SciDB.org). This paper presents the main design decisions of SciDB. It focuses on our decisions concerning a high-level, SQL-like query language, the issues facing our query optimizer and executor and efficient storage management for arrays. The paper also discusses implementation of features not usually present in DBMSs, including version control, uncertainty and provenance.
منابع مشابه
D4M and Large Array Databases for Management and Analysis of Large Biomedical Imaging Data
Advances in medical imaging technologies have enabled the acquisition of increasingly large datasets. Current state-of-the-art confocal or multi-photon imaging technology can produce biomedical datasets in excess of 1 TB per dataset. Typical approaches for analyzing large datasets rely on downsampling the original datasets or leveraging distributed computing resources where small subsets of ima...
متن کاملA Demonstration of SciDB: A Science-Oriented DBMS
In CIDR 2009, we presented a collection of requirements for SciDB, a DBMS that would meet the needs of scientific users. These included a nested-array data model, sciencespecific operations such as regrid, and support for uncertainty, lineage, and named versions. In this paper, we present an overview of SciDB’s key features and outline a demonstration of the first version of SciDB on data and o...
متن کاملPerformance Comparison of Big-Data Technologies in Locating Intersections in Satellite Ground Tracks
The performance and ease of extensibility for two Big-Data technologies, SciDB and Hadoop/MapReduce (HD/MR), are evaluated on identical hardware for an Earth science use case of locating intersections between two NASA remote sensing satellites’ ground tracks. SciDB is found to be 1.5 to 2.5 times faster than HD/MR. The performance of HD/MR approaches that of SciDB as the data size or the cluste...
متن کاملEvaluation of SciDB for Image and Video Processing Tasks
In this report, we evaluate the use of SciDB, an array database, for image processing workloads. We explore the representation of images in SciDB, select four specific image-processing tasks to implement in SciDB, and evaluated them on an AWS cluster using a dataset of a 1000 high-resolution images. Even with a simple and non-optimized MPI baseline, we find SciDB to be an order of magnitude slo...
متن کاملSciDB DBMS Research at M.I.T
This paper presents a snapshot of some of our scientific DBMS research at M.I.T. as part of the Intel Science and Technology Center on Big Data. We focus our efforts primarily on SciDB, although some of our work can be used for any backend DBMS. We summarize our work on making SciDB elastic, providing skew-aware join strategies, and producing scalable visualizations of scientific data.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011